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Abstract: In this short paper, we are interested in the normalizing constant of one of the most 
commonly used absolutely continuous distributions in statistics and probability, namely Student's 
t distribution. We show that the normalizing constant of a (multivariate) t distribution converges 
monotonically towards the normalizing constant of a (multivariate) Gaussian distribution. As it 
turns out, this monotonicity changes from increasing in dimension one to decreasing in dimensions 
fc > 3 whilst being constant in dimension two. We discuss some surprising "dimension-based" 
interpretations of this change of monotonicity in terms of kurtosis and illustrate the effect of the 
dimension by means of density plots. Our findings thus should contribute to a clearer understanding 
of the concept of heavy tails in high dimensions. 
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1. Introduction. 

Though already introduced by Helmert (1875), Liiroth (1876) and Pearson (1895), the (univariate) t 
distribution is usually attributed to William Sealy Gosset who, under the pseudonym Student in 1908 
(see Student 1908), (re-)defincd this probability distribution, whence the commonly used expression 
Student t distribution. This terminology has been coined by Sir Ronald A. Fisher in Fisher (1925), a 
paper that has very much contributed to making the t distribution well-known. This early success has 
motivated researchers to generalize the t distribution to higher dimensions; the resulting multivariate t 
distribution has been studied, inter alia, by Cornish (1954) and Dunnett and Sobel (1954). The success 
story of the Student t distribution yet went on, and nowadays it is one of the most commonly used 
absolutely continuous distributions in statistics and probability. It arises in many situations, including 
e.g. the Bayesian analysis, estimation, hypotheses testing and modeling of financial data. For a review 
on the numerous theoretical results and statistical aspects, we refer to Johnson et al. (1994) for the 
one-dimensional and to Johnson and Kotz (1972) for the multi-dimensional setup. 
Under its most common form, the fc -dimensional t distribution admits the density 

/m.e,.(x) c,,,|Eri/2 + ||S-i/2(x _;,)||2/^)"^ , X G M^ 

with location parameter /x S M*^ , scatter parameter S e 5^ , the class of symmetric and positive definite 
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k X k matrices, and tail weight parameter v G Mg , and with normahzing constant 

r(^) 
(^i.)'=/2r(f)' 

where the Gamma function is defined by r(z) — Jp°° exp(— 1)<^^^ di . The kurtosis of the t distribution 
is of course regulated by the parameter v: the smaller v , the heavier the tails. For instance, for v ^ 1, 
we retrieve the fat-tailed Cauchy distribution. Of particular interest is the limiting case when v tends to 
infinity, which yields the multivariate Gaussian distribution with density 

(27r)-'=/2|2|-i/2exp(^-i||E-i/2(x-M)||') , x e 

The t model thus embeds the Gaussian distribution into a parametric class of fat-tailed distributions. 
Indeed, basic calculations show that 

Jim (l + ||S-V2(, _ i,)\\yuy^ = exp (-^||E-V2(, _ ^)||2^ 

and limi^^oo c,y,fe = {2n)^''/^ . 

In this short paper, we are particularly interested in the latter limit. More concretely, we have 
been asking ourselves whether or not this convergence of Ci,^k towards the normalizing constant of a 
Gaussian distribution occurs in a monotone way. This question has arised during the preparation of a 
forthcoming inferential paper involving the t distribution. We have first been able to prove that v i— > c^ i 
is monotonically increasing to {2n)^^/^ . Moving on to higher dimensions, we have noticed that this 
monotonicity dramatically changes: indeed, in dimension two, one easily remarks that c^,2 — (27r)~^ 
whatever and a much less straightforward proof has permitted us to show that v i— Ci^ ^ decreases 
towards {2tt)^''^'^ for all fc > 3. See Section 2 for a formal statement (Theorem 1) and for the proof of 
these claims. We have been quite surprised by these findings, as we did not expect the dimension to have 
any effect on the convergence of Ci,^k towards (27r)~''/2 . Further thoughts have led us to a nice "dimension- 
based" interpretation of this result in terms of tail weight and probability mass around the center of the 
t distributions, which we present in Section 3. As we shall see in that section, our interpretation sheds 
some new light onto the meaning of tail weight in higher dimensions and hence contributes to a, at least 
in our opinion, better understanding of the concept of fat tails in a multidimensional setup. 



2. Monotonicity results. 

Before establishing our monotonicity results, let us start by introducing some notations that will be useful 
in the sequel. To avoid repetition, let us mention that the subsequent formulae are all valid for x € Rq . 
Denoting by :~ D^x'' the first derivative, define 

i^ix) = Dx iog(r(x)), 

the so-called digamma function or first polygamma function. The well-known functional equation 

T{x + l)=xr{x) (1) 



thus allows to obtain, by taking logarithms and differentiating, 

ipix + 1)^ tpix) + -. 



(2) 
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Another interesting and useful formula is the series representation of the derivatives of ip : 

£)i")V(x) ^ y ^ , n>l. (3) 

For on overview and proofs of these results, we refer to Artin (1964). 

Now, with these notations in hand, we are ready to state the main result of this paper, namely the 
announced monotonicity result of the normalizing constants Ci, ^ . 

Theorem 1 (Dimension-based monotonicity of the normalizing constants in t distributions) . For fc G Nq , 

define the mapping gu : M(j ^ M+ by gk{v) = (^^)fc/ir(|) ■ have 

(i) if k = I , gk{i^) is monotonically increasing in v ; 
(ii) if k = 2 , gk{i^) is constant in v ; 
(Hi) if k > 3 , gk{i^) is monotonically decreasing in v . 

Proof, (i) Basic calculus manipulations show that 

^.(lo..,M)^i(*(!^)-*(0-i), 

with ip the digamma function. By (3), we know that 

Thus, ip is an increasing and concave function on M J . Using concavity together with identity (2), we 
have in particular 

, fv+l\ 1 fh'\ 1 \ fv\ 1 



2 y - 2' V2/ 2" V2 J " \2J V 

This inequality readily allows us to deduce that log (71(1/), and hence 51 (j^), is monotonically increasing 
in V . 

(ii) If fc = 2, the function g2[v) reduces to ^ by simply applying (1), in other words it equals its 
limit, whence the claim. 

(iii) Assume first that fc > 3 is even, hence that k/2 is an integer. Using iteratively identity (1), we 
can write 



„ ' V k\ ( V k \ ( V k , 
^'2 + 2 = 2 + 2-0 2 + 2-2 •••2^2 



-|- factors 

which imphes 

r(i^) _fe/2/l kl2^\\(\ k/2-2\ fl 1\1 

^^(-)= (..)^-/^r(|) =- ' U + ^j U + ^j---U + ^j2- 

Since a + b/v is monotonically decreasing in v when & > 0, gkiy) happens to be the product of 
monotonically decreasing and positive functions in v. Thus gk{^) is itself monotonically decreasing in 
J/, which allows to conclude for k even. 

Now assume that /c > 3 is odd. We set k ~ 2m + 1 with rn G No . The proof is based on the same 
idea as the proof for the one-dimensional case. One easily sees that 

Z..(log»M)4(*(^i±^)-*(0-^), (4) 



C. Ley and A. Neven/The normalizing constant in multivariate t distributions 



4 



with ijj the digamma function. In the rest of this proof, we estabHsh the monotonicity of gk{i^) by proving 
by induction on k (respectively, on to) that Di^iloggkiv)) < for ah v S ■ 

Base case: If to = 1 (which imphes fc = 3), identity (2) yields ip{i'/2) 
can be rewritten as 

By concavity of the digamma function, we have the inequality 

and thus (5) can be bounded by 

where we have again used (2). So D^{\ogg3{i^)) < for all i/ G M^j" , and the claim holds for the base 
case. 

Induction case: Assume that the expression in (4) is negative for k — 2m + 1 with m e Nq . We now 
show that the claim is true for k' = 2(to + + l — k + 2. It follows once more from (2), combined with 
the fact that ly + k > v , that 

.(^)-.(9-^.(^.i)-*(0-^ 

< 0, 

where the final inequality is due to the induction hypothesis. Thus Dy{\oggk{v)) < for all odd fc > 3, 
which concludes the proof. □ 

Remark 1. One easily sees that the induction-based proof also holds under slight modifications for fc > 4 
even, but we prefer to show this shorter proof for the even- k case. 

For the sake of illustration, we provide in Figure 1 the curves of the normalizing constants gk{i^) = Cu,k 
for = 1, 2, 3 and 4. The respective asymptotics of course correspond to the respective limits (27r)~'^/'^ . 
While the monotone convergence of the normalizing constants c^^k to (27r)^'^/^ is by no means surprising, 
the fact that this monotonicity changes from increasing in dimension fc = 1 to decreasing in dimensions 
fc > 3 whilst being constant in dimension fc = 2 seems at first sight, as already mentioned in the 
Introduction, puzzling. It is all the more astonishing as the moment-based kurtosis ratio between two t 
distributions does not alter with the dimension. Indeed, letting, without loss of generality, Xi and X2 be 
standardized (that is, fj, = and S = , the k x k identity matrix) fc-variate random vectors following 
each a t distribution with respective parameters vi and 1^2, straightforward calculations show that, for 



= ■ip{iy/2 + 1) - 2/iy, hence (4) 

(5) 



' <0, 



2iy{i' + 1) 



2 

V 

fc 2 

V V 
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(b) 




(c) 



Fig 1. Plots of Qkif) = c^,k for (a) k = I , (b) k = 2, (c) k = 3, and (d) 
fc = 4. 



m < min(i^i, 1/2) , 



E[||Xi|r] _ /o°°r™+'=-ic,,,fe(l + rVi.i)-^dr 



EfllXo 



27r''/2r(^) 



dr 



^/2p|- fc + 



r(^±=)r(^ 



-) 



r(j^)r(f ) 
-J r(^^)r('^)' 

which does not depend on the dimension k . This resuh can be summarized in the following proposition. 

Proposition 1. Let Xi and X2 be standardized k-variate random vectors following each a t distribution 
with respective parameters vi and V2 ■ Then, for all m < min(zyi, 1^2) ; the ratio does not depend 

on the dimension k . Consequently, the moment-based kurtosis ratio or fourth standardized moment ratio 



Jui,k 

^V2,k 



E[||Xi|n 
(E[||Xi|P])=^ 

E[||X2||-t] 
(E[||X2||2])2 



is the same for each dimension k , provided that min(i'i, 1/2) > 4. 

Theorem 1 in conjunction with Proposition 1 seems counterintuitive. However, a finer analysis of the 
density curves or, more precisely, of the pseudo-density (that is, density without the normalizing constant) 
curves allows to grasp the reasons behind the changing monotonicity in the normalizing constant, and 
leads to an interesting interpretation in terms of tail weight and probability mass around the center, see 
the next section. 
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3. Interpretation of the monotonicity results. 

First recall that the normalizing constants c^j^ are nothing else but 1/ J^f, (l + ||x||^/i/) ^ dx. Thus, 
transcripting Theorem 1 in terms of this integral gives 

I' + k 

• if fc = 1, Jjg (l + jv) ^ dx is monotonically decreasing in v\ 

_ u^k 

• if fc = 2, /jj2 (l + ^ dx is constant in v\ 

_v±k_ 

• if fc > 3, Jjjfc (l + ||x|p/z/) ^ dx is monotonically increasing in v . 

Since multivariate t distributions are heavy-tailed alternatives to the multivariate Gaussian distribution, 
one could expect that, even deprived of their normalizing constant, the integrals of their pseudo-densities 
(l -|- ||x|p/z^) ^ are larger than those of the multivariate Gaussian pseudo-density exp (— i||x||^) . In 
other words, the monotone decreasing behavior of the one-dimensional setup corresponds to what one 
would normally expect. However, our results prove this intuition to be wrong. In order to provide more 
insight into the intricacies of this unexpected outcome, we give in Figure 2 the plot of the pseudo-densities 
of /o,/fc,;y for k = 1,2,3,4 and v — 1,2,10 and oo (that is, the Gaussian pseudo-density), where, for 
A: > 2 , we have plotted the pseudo-densities along a given axis (here the first vector of the canonical basis 
of M'^ , but the spherical symmetry of our fo,!^.^ ensures that the choice of axis is not relevant). For 
the sake of illustration, we show the behavior of the pseudo-densities around the center (for the interval 
(—3,3)) and in the tails (for the interval (3,6)). 

Visual inspection of Figure 2 reveals that, as the dimension k grows, the dominance (in terms of 
probability mass) of the Gaussian around the center becomes stronger, whereas its inferiority in the 
queues is dampened. More generally, it is clear that, as v increases, more probability mass becomes 
amassed around the center, while the tail weight decreases. The message our plots convey is the following: 
as the dimension increases, the first effect in is further stimulated, while the second one is reduced. Of 
course, the latter reduction is partly thwarted by the dimension increase itself, as the, though smaller, 
decrease in tail weight along each axis becomes amplified by the larger space. 

These findings about the pseudo-densities, combined with the changing monotonicity of the normalizing 
constant, lead, according to us, to a better understanding of the multivariate t distributions themselves. 
Indeed, the "loss in peakedness" around the center of the Gaussian distribution compared to heavier- 
tailed t distributions when the dimension increases, as can be seen from the plots of the (true) densities of 
multivariate t distributions (see Figure 3), now becomes more understandable: in higher dimensions, the 
Gaussian pseudo-densities are multiplied by a smaller normalizing constant than the t pseudo-densities. 
The tails, when varying the tail weight parameter also tend to get closer when we go higher in 
dimensions. These conclusions, melted with the dimension-independent moment-based kurtosis ratio, 
finally reveal that, when thinking about heavy-tailed distributions in high dimensions, one should not 
only expect to see large differences in the queues but also take into account that the space on which these 
differences take place becomes larger as the dimension increases, which amplifies the differences. 
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Fig 2. Plots of the central behavior (left sub-figures) and the tail behavior 

(right sub-figures) of the t pseudo-densities (l + ||x|p/f) ^ j^qj- dimen- 
sions k = 1 ((a) and (b)), k = 2 ((c) and (d)), k = 3 ((e) and (f)), and 
k = 4 ( (g) and (h)). Within each sub-figure, we have chosen four values for 
the tail parameter u : 1 (blue curves), 2 (red curves), 10 (yellow curves) 
and oo (green curves). From dimension 2 onwards, the pseudo-densities are 
plotted along the first vector of the canonical basis of . 
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Fig 3. Plots of the k-variate t densities c^ i^ (l + ||x|p/i^) ^ dimen- 
sions (a) k = 1 , (b) k = 2 , (c) k = 3, and (d) fc = 4 . Within each sub-figure, 
we have chosen four values for the tail parameter v : 1 (blue curves), 2 (red 
curves), 10 (yellow curves) and oo (green curves). From dimension 2 on- 
wards, the densities are plotted along the first vector of the canonical basis of 
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